Dataset statistics
| Number of variables | 16 |
|---|---|
| Number of observations | 787 |
| Missing cells | 1159 |
| Missing cells (%) | 9.2% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 98.5 KiB |
| Average record size in memory | 128.2 B |
Variable types
| NUM | 12 |
|---|---|
| CAT | 4 |
Reproduction
| Analysis started | 2020-09-12 13:26:47.120394 |
|---|---|
| Analysis finished | 2020-09-12 13:28:17.488046 |
| Duration | 1 minute and 30.37 seconds |
| Version | pandas-profiling v2.8.0 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download configuration | config.yaml |
City has a high cardinality: 772 distinct values | High cardinality |
Popuation [2001] is highly correlated with Population [2011] and 1 other fields | High correlation |
Population [2011] is highly correlated with Popuation [2001] and 1 other fields | High correlation |
Female Population is highly correlated with Population [2011] and 1 other fields | High correlation |
Population [2011] has 48 (6.1%) missing values | Missing |
Popuation [2001] has 492 (62.5%) missing values | Missing |
Sex Ratio has 10 (1.3%) missing values | Missing |
Median Age has 18 (2.3%) missing values | Missing |
Avg Temp has 17 (2.2%) missing values | Missing |
Toilets Avl has 26 (3.3%) missing values | Missing |
Water Purity has 158 (20.1%) missing values | Missing |
H Index has 140 (17.8%) missing values | Missing |
Female Population has 141 (17.9%) missing values | Missing |
# of hospitals has 15 (1.9%) missing values | Missing |
Foreign Visitors has 90 (11.4%) missing values | Missing |
City is uniformly distributed | Uniform |
| Distinct count | 772 |
|---|---|
| Unique (%) | 98.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 6.1 KiB |
| Aurangabad | 3 |
|---|---|
| Ramnagar | 3 |
| Phagwara | 2 |
| Kavali | 2 |
| Tezpur | 2 |
| Other values (767) |
| Value | Count | Frequency (%) | |
| Aurangabad | 3 | 0.4% | |
| Ramnagar | 3 | 0.4% | |
| Phagwara | 2 | 0.3% | |
| Kavali | 2 | 0.3% | |
| Tezpur | 2 | 0.3% | |
| Tinsukia | 2 | 0.3% | |
| Jorhat | 2 | 0.3% | |
| Tiruppur | 2 | 0.3% | |
| Thrissur | 2 | 0.3% | |
| Miryalaguda | 2 | 0.3% | |
| Other values (762) | 765 | 97.2% |
Length
| Max length | 28 |
|---|---|
| Median length | 8 |
| Mean length | 8.363405337 |
| Min length | 3 |
State
Categorical
| Distinct count | 33 |
|---|---|
| Unique (%) | 4.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 6.1 KiB |
| Andhra Pradesh | 78 |
|---|---|
| Maharashtra | 73 |
| Uttar Pradesh | 67 |
| Tamil Nadu | 63 |
| Bihar | 51 |
| Other values (28) |
| Value | Count | Frequency (%) | |
| Andhra Pradesh | 78 | 9.9% | |
| Maharashtra | 73 | 9.3% | |
| Uttar Pradesh | 67 | 8.5% | |
| Tamil Nadu | 63 | 8.0% | |
| Bihar | 51 | 6.5% | |
| Karnataka | 44 | 5.6% | |
| Madhya Pradesh | 43 | 5.5% | |
| West Bengal | 42 | 5.3% | |
| Gujarat | 41 | 5.2% | |
| Kerala | 39 | 5.0% | |
| Other values (23) | 246 | 31.3% |
Length
| Max length | 27 |
|---|---|
| Median length | 10 |
| Mean length | 9.673443456 |
| Min length | 3 |
Type
Categorical
| Distinct count | 37 |
|---|---|
| Unique (%) | 4.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 6.1 KiB |
| C-1T | |
|---|---|
| M | |
| M.Cl | |
| MPUA | 44 |
| M.B | 28 |
| Other values (32) |
| Value | Count | Frequency (%) | |
| C-1T | 269 | 34.2% | |
| M | 236 | 30.0% | |
| M.Cl | 59 | 7.5% | |
| MPUA | 44 | 5.6% | |
| M.B | 28 | 3.6% | |
| UA | 28 | 3.6% | |
| N.P.P | 13 | 1.7% | |
| T.M.C | 13 | 1.7% | |
| N.P | 10 | 1.3% | |
| C.T | 9 | 1.1% | |
| Other values (27) | 78 | 9.9% |
Length
| Max length | 14 |
|---|---|
| Median length | 4 |
| Mean length | 3.015247776 |
| Min length | 1 |
| Distinct count | 730 |
|---|---|
| Unique (%) | 98.8% |
| Missing | 48 |
| Missing (%) | 6.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 310283.4167794317 |
|---|---|
| Minimum | 36776.0 |
| Maximum | 12442373.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 6.1 KiB |
Quantile statistics
| Minimum | 36776 |
|---|---|
| 5-th percentile | 38989.8 |
| Q1 | 52550 |
| median | 79106 |
| Q3 | 237476.5 |
| 95-th percentile | 1121730.6 |
| Maximum | 12442373 |
| Range | 12405597 |
| Interquartile range (IQR) | 184926.5 |
Descriptive statistics
| Standard deviation | 887484.8744 |
|---|---|
| Coefficient of variation (CV) | 2.860239466 |
| Kurtosis | 92.3470521 |
| Mean | 310283.4168 |
| Median Absolute Deviation (MAD) | 36108 |
| Skewness | 8.579129273 |
| Sum | 229299445 |
| Variance | 7.876294023e+11 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 70777 | 2 | 0.3% | |
| 49985 | 2 | 0.3% | |
| 38554 | 2 | 0.3% | |
| 61632 | 2 | 0.3% | |
| 206167 | 2 | 0.3% | |
| 42461 | 2 | 0.3% | |
| 45858 | 2 | 0.3% | |
| 65232 | 2 | 0.3% | |
| 37802 | 2 | 0.3% | |
| 44314 | 1 | 0.1% | |
| Other values (720) | 720 | 91.5% | |
| (Missing) | 48 | 6.1% |
| Value | Count | Frequency (%) | |
| 36776 | 1 | 0.1% | |
| 36805 | 1 | 0.1% | |
| 36828 | 1 | 0.1% | |
| 36947 | 1 | 0.1% | |
| 36954 | 1 | 0.1% |
| Value | Count | Frequency (%) | |
| 12442373 | 1 | 0.1% | |
| 11007835 | 1 | 0.1% | |
| 8436675 | 1 | 0.1% | |
| 6809970 | 1 | 0.1% | |
| 5570585 | 1 | 0.1% |
| Distinct count | 292 |
|---|---|
| Unique (%) | 99.0% |
| Missing | 492 |
| Missing (%) | 62.5% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 532045.1322033898 |
|---|---|
| Minimum | 29354.0 |
| Maximum | 11978450.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 6.1 KiB |
Quantile statistics
| Minimum | 29354 |
|---|---|
| 5-th percentile | 99573 |
| Q1 | 169432 |
| median | 236600 |
| Q3 | 474585 |
| 95-th percentile | 1410133.1 |
| Maximum | 11978450 |
| Range | 11949096 |
| Interquartile range (IQR) | 305153 |
Descriptive statistics
| Standard deviation | 1067831.381 |
|---|---|
| Coefficient of variation (CV) | 2.007031577 |
| Kurtosis | 65.66343066 |
| Mean | 532045.1322 |
| Median Absolute Deviation (MAD) | 97282 |
| Skewness | 7.26489863 |
| Sum | 156953314 |
| Variance | 1.140263858e+12 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 271811 | 2 | 0.3% | |
| 296662 | 2 | 0.3% | |
| 228175 | 2 | 0.3% | |
| 260906 | 1 | 0.1% | |
| 269122 | 1 | 0.1% | |
| 165212 | 1 | 0.1% | |
| 310967 | 1 | 0.1% | |
| 231515 | 1 | 0.1% | |
| 166125 | 1 | 0.1% | |
| 426674 | 1 | 0.1% | |
| Other values (282) | 282 | 35.8% | |
| (Missing) | 492 | 62.5% |
| Value | Count | Frequency (%) | |
| 29354 | 1 | 0.1% | |
| 73455 | 1 | 0.1% | |
| 79190 | 1 | 0.1% | |
| 79393 | 1 | 0.1% | |
| 81503 | 1 | 0.1% |
| Value | Count | Frequency (%) | |
| 11978450 | 1 | 0.1% | |
| 9879172 | 1 | 0.1% | |
| 4572876 | 1 | 0.1% | |
| 4343645 | 1 | 0.1% | |
| 4301326 | 1 | 0.1% |
| Distinct count | 160 |
|---|---|
| Unique (%) | 20.6% |
| Missing | 10 |
| Missing (%) | 1.3% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 905.7129987129987 |
|---|---|
| Minimum | 818.0 |
| Maximum | 1042.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 6.1 KiB |
Quantile statistics
| Minimum | 818 |
|---|---|
| 5-th percentile | 846 |
| Q1 | 877 |
| median | 906 |
| Q3 | 928 |
| 95-th percentile | 968 |
| Maximum | 1042 |
| Range | 224 |
| Interquartile range (IQR) | 51 |
Descriptive statistics
| Standard deviation | 37.01854158 |
|---|---|
| Coefficient of variation (CV) | 0.04087226488 |
| Kurtosis | -0.08383561613 |
| Mean | 905.7129987 |
| Median Absolute Deviation (MAD) | 25 |
| Skewness | 0.2218225024 |
| Sum | 703739 |
| Variance | 1370.372421 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 923 | 19 | 2.4% | |
| 871 | 15 | 1.9% | |
| 922 | 14 | 1.8% | |
| 872 | 14 | 1.8% | |
| 929 | 11 | 1.4% | |
| 916 | 11 | 1.4% | |
| 882 | 11 | 1.4% | |
| 890 | 11 | 1.4% | |
| 917 | 10 | 1.3% | |
| 869 | 10 | 1.3% | |
| Other values (150) | 651 | 82.7% |
| Value | Count | Frequency (%) | |
| 818 | 1 | 0.1% | |
| 820 | 1 | 0.1% | |
| 821 | 1 | 0.1% | |
| 822 | 1 | 0.1% | |
| 823 | 1 | 0.1% |
| Value | Count | Frequency (%) | |
| 1042 | 1 | 0.1% | |
| 1036 | 1 | 0.1% | |
| 1031 | 1 | 0.1% | |
| 1023 | 1 | 0.1% | |
| 1019 | 1 | 0.1% |
| Distinct count | 10 |
|---|---|
| Unique (%) | 1.3% |
| Missing | 18 |
| Missing (%) | 2.3% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 26.18335500650195 |
|---|---|
| Minimum | 23.0 |
| Maximum | 32.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 6.1 KiB |
Quantile statistics
| Minimum | 23 |
|---|---|
| 5-th percentile | 23 |
| Q1 | 24 |
| median | 26 |
| Q3 | 28 |
| 95-th percentile | 29 |
| Maximum | 32 |
| Range | 9 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 2.113062962 |
|---|---|
| Coefficient of variation (CV) | 0.08070252884 |
| Kurtosis | -1.015686277 |
| Mean | 26.18335501 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 0.1090339961 |
| Sum | 20135 |
| Variance | 4.465035083 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 28 | 117 | 14.9% | |
| 25 | 116 | 14.7% | |
| 24 | 108 | 13.7% | |
| 29 | 108 | 13.7% | |
| 26 | 101 | 12.8% | |
| 27 | 98 | 12.5% | |
| 23 | 97 | 12.3% | |
| 30 | 14 | 1.8% | |
| 31 | 8 | 1.0% | |
| 32 | 2 | 0.3% | |
| (Missing) | 18 | 2.3% |
| Value | Count | Frequency (%) | |
| 23 | 97 | 12.3% | |
| 24 | 108 | 13.7% | |
| 25 | 116 | 14.7% | |
| 26 | 101 | 12.8% | |
| 27 | 98 | 12.5% |
| Value | Count | Frequency (%) | |
| 32 | 2 | 0.3% | |
| 31 | 8 | 1.0% | |
| 30 | 14 | 1.8% | |
| 29 | 108 | 13.7% | |
| 28 | 117 | 14.9% |
| Distinct count | 26 |
|---|---|
| Unique (%) | 3.4% |
| Missing | 17 |
| Missing (%) | 2.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 30.941558441558442 |
|---|---|
| Minimum | 5.0 |
| Maximum | 40.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 6.1 KiB |
Quantile statistics
| Minimum | 5 |
|---|---|
| 5-th percentile | 13.9 |
| Q1 | 28 |
| median | 31 |
| Q3 | 36 |
| 95-th percentile | 40 |
| Maximum | 40 |
| Range | 35 |
| Interquartile range (IQR) | 8 |
Descriptive statistics
| Standard deviation | 6.968288763 |
|---|---|
| Coefficient of variation (CV) | 0.2252080733 |
| Kurtosis | 3.116849321 |
| Mean | 30.94155844 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | -1.489797407 |
| Sum | 23825 |
| Variance | 48.55704828 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 30 | 57 | 7.2% | |
| 37 | 53 | 6.7% | |
| 26 | 53 | 6.7% | |
| 29 | 50 | 6.4% | |
| 33 | 49 | 6.2% | |
| 40 | 47 | 6.0% | |
| 28 | 47 | 6.0% | |
| 25 | 47 | 6.0% | |
| 38 | 45 | 5.7% | |
| 35 | 45 | 5.7% | |
| Other values (16) | 277 | 35.2% |
| Value | Count | Frequency (%) | |
| 5 | 5 | 0.6% | |
| 6 | 4 | 0.5% | |
| 7 | 2 | 0.3% | |
| 8 | 7 | 0.9% | |
| 9 | 4 | 0.5% |
| Value | Count | Frequency (%) | |
| 40 | 47 | 6.0% | |
| 39 | 35 | 4.4% | |
| 38 | 45 | 5.7% | |
| 37 | 53 | 6.7% | |
| 36 | 30 | 3.8% |
SWM
Categorical
| Distinct count | 3 |
|---|---|
| Unique (%) | 0.4% |
| Missing | 4 |
| Missing (%) | 0.5% |
| Memory size | 6.1 KiB |
| HIGH | |
|---|---|
| LOW | |
| MEDIUM |
| Value | Count | Frequency (%) | |
| HIGH | 272 | 34.6% | |
| LOW | 260 | 33.0% | |
| MEDIUM | 251 | 31.9% | |
| (Missing) | 4 | 0.5% |
Length
| Max length | 6 |
|---|---|
| Median length | 4 |
| Mean length | 4.302414231 |
| Min length | 3 |
| Distinct count | 107 |
|---|---|
| Unique (%) | 14.1% |
| Missing | 26 |
| Missing (%) | 3.3% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 96.08672798948751 |
|---|---|
| Minimum | 50.0 |
| Maximum | 227.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 6.1 KiB |
Quantile statistics
| Minimum | 50 |
|---|---|
| 5-th percentile | 54 |
| Q1 | 70 |
| median | 92 |
| Q3 | 119 |
| 95-th percentile | 146 |
| Maximum | 227 |
| Range | 177 |
| Interquartile range (IQR) | 49 |
Descriptive statistics
| Standard deviation | 30.5329907 |
|---|---|
| Coefficient of variation (CV) | 0.3177649124 |
| Kurtosis | 0.4551361719 |
| Mean | 96.08672799 |
| Median Absolute Deviation (MAD) | 24 |
| Skewness | 0.6519525763 |
| Sum | 73122 |
| Variance | 932.263521 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 65 | 15 | 1.9% | |
| 90 | 15 | 1.9% | |
| 66 | 15 | 1.9% | |
| 100 | 14 | 1.8% | |
| 91 | 14 | 1.8% | |
| 61 | 14 | 1.8% | |
| 70 | 13 | 1.7% | |
| 99 | 13 | 1.7% | |
| 92 | 12 | 1.5% | |
| 57 | 12 | 1.5% | |
| Other values (97) | 624 | 79.3% | |
| (Missing) | 26 | 3.3% |
| Value | Count | Frequency (%) | |
| 50 | 6 | 0.8% | |
| 51 | 6 | 0.8% | |
| 52 | 9 | 1.1% | |
| 53 | 10 | 1.3% | |
| 54 | 10 | 1.3% |
| Value | Count | Frequency (%) | |
| 227 | 1 | 0.1% | |
| 219 | 1 | 0.1% | |
| 217 | 1 | 0.1% | |
| 215 | 1 | 0.1% | |
| 212 | 1 | 0.1% |
| Distinct count | 101 |
|---|---|
| Unique (%) | 16.1% |
| Missing | 158 |
| Missing (%) | 20.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 150.37360890302068 |
|---|---|
| Minimum | 100.0 |
| Maximum | 200.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 6.1 KiB |
Quantile statistics
| Minimum | 100 |
|---|---|
| 5-th percentile | 105 |
| Q1 | 125 |
| median | 150 |
| Q3 | 176 |
| 95-th percentile | 195 |
| Maximum | 200 |
| Range | 100 |
| Interquartile range (IQR) | 51 |
Descriptive statistics
| Standard deviation | 29.06376698 |
|---|---|
| Coefficient of variation (CV) | 0.1932770463 |
| Kurtosis | -1.244995104 |
| Mean | 150.3736089 |
| Median Absolute Deviation (MAD) | 25 |
| Skewness | -0.01605517129 |
| Sum | 94585 |
| Variance | 844.7025508 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 134 | 12 | 1.5% | |
| 129 | 12 | 1.5% | |
| 172 | 11 | 1.4% | |
| 116 | 11 | 1.4% | |
| 105 | 11 | 1.4% | |
| 115 | 11 | 1.4% | |
| 125 | 10 | 1.3% | |
| 173 | 10 | 1.3% | |
| 176 | 9 | 1.1% | |
| 186 | 9 | 1.1% | |
| Other values (91) | 523 | 66.5% | |
| (Missing) | 158 | 20.1% |
| Value | Count | Frequency (%) | |
| 100 | 3 | 0.4% | |
| 101 | 5 | 0.6% | |
| 102 | 8 | 1.0% | |
| 103 | 3 | 0.4% | |
| 104 | 3 | 0.4% |
| Value | Count | Frequency (%) | |
| 200 | 5 | 0.6% | |
| 199 | 8 | 1.0% | |
| 198 | 6 | 0.8% | |
| 197 | 9 | 1.1% | |
| 196 | 1 | 0.1% |
| Distinct count | 647 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 140 |
| Missing (%) | 17.8% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.4970691911347755 |
|---|---|
| Minimum | 0.0030743436420811454 |
| Maximum | 0.9997737154818801 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 6.1 KiB |
Quantile statistics
| Minimum | 0.003074343642 |
|---|---|
| 5-th percentile | 0.04402641715 |
| Q1 | 0.2385864179 |
| median | 0.5070035548 |
| Q3 | 0.7525169391 |
| 95-th percentile | 0.9474487944 |
| Maximum | 0.9997737155 |
| Range | 0.9966993718 |
| Interquartile range (IQR) | 0.5139305211 |
Descriptive statistics
| Standard deviation | 0.2934213852 |
|---|---|
| Coefficient of variation (CV) | 0.5903029003 |
| Kurtosis | -1.262279578 |
| Mean | 0.4970691911 |
| Median Absolute Deviation (MAD) | 0.2518362572 |
| Skewness | -0.001863978178 |
| Sum | 321.6037667 |
| Variance | 0.08609610929 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0.008755664986 | 1 | 0.1% | |
| 0.04044734833 | 1 | 0.1% | |
| 0.43898665 | 1 | 0.1% | |
| 0.09457941448 | 1 | 0.1% | |
| 0.2622769277 | 1 | 0.1% | |
| 0.7621123303 | 1 | 0.1% | |
| 0.1750121394 | 1 | 0.1% | |
| 0.650818141 | 1 | 0.1% | |
| 0.6792644647 | 1 | 0.1% | |
| 0.9076620147 | 1 | 0.1% | |
| Other values (637) | 637 | 80.9% | |
| (Missing) | 140 | 17.8% |
| Value | Count | Frequency (%) | |
| 0.003074343642 | 1 | 0.1% | |
| 0.004921502153 | 1 | 0.1% | |
| 0.005168423715 | 1 | 0.1% | |
| 0.006412195407 | 1 | 0.1% | |
| 0.008755664986 | 1 | 0.1% |
| Value | Count | Frequency (%) | |
| 0.9997737155 | 1 | 0.1% | |
| 0.999185528 | 1 | 0.1% | |
| 0.9991392004 | 1 | 0.1% | |
| 0.9991114073 | 1 | 0.1% | |
| 0.9987569952 | 1 | 0.1% |
| Distinct count | 645 |
|---|---|
| Unique (%) | 99.8% |
| Missing | 141 |
| Missing (%) | 17.9% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 291001.13931888546 |
|---|---|
| Minimum | 30913.0 |
| Maximum | 10924403.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 6.1 KiB |
Quantile statistics
| Minimum | 30913 |
|---|---|
| 5-th percentile | 35146.75 |
| Q1 | 45144.5 |
| median | 83067.5 |
| Q3 | 220677.25 |
| 95-th percentile | 978485.25 |
| Maximum | 10924403 |
| Range | 10893490 |
| Interquartile range (IQR) | 175532.75 |
Descriptive statistics
| Standard deviation | 835434.7538 |
|---|---|
| Coefficient of variation (CV) | 2.870898567 |
| Kurtosis | 80.30715994 |
| Mean | 291001.1393 |
| Median Absolute Deviation (MAD) | 45648 |
| Skewness | 8.09154773 |
| Sum | 187986736 |
| Variance | 6.979512278e+11 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 35139 | 2 | 0.3% | |
| 55061 | 1 | 0.1% | |
| 392871 | 1 | 0.1% | |
| 218707 | 1 | 0.1% | |
| 971083 | 1 | 0.1% | |
| 482981 | 1 | 0.1% | |
| 38025 | 1 | 0.1% | |
| 46058 | 1 | 0.1% | |
| 204837 | 1 | 0.1% | |
| 108346 | 1 | 0.1% | |
| Other values (635) | 635 | 80.7% | |
| (Missing) | 141 | 17.9% |
| Value | Count | Frequency (%) | |
| 30913 | 1 | 0.1% | |
| 31263 | 1 | 0.1% | |
| 32277 | 1 | 0.1% | |
| 32694 | 1 | 0.1% | |
| 32973 | 1 | 0.1% |
| Value | Count | Frequency (%) | |
| 10924403 | 1 | 0.1% | |
| 9444722 | 1 | 0.1% | |
| 7896728 | 1 | 0.1% | |
| 6333272 | 1 | 0.1% | |
| 4746138 | 1 | 0.1% |
| Distinct count | 76 |
|---|---|
| Unique (%) | 9.8% |
| Missing | 15 |
| Missing (%) | 1.9% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 41.84974093264249 |
|---|---|
| Minimum | 10.0 |
| Maximum | 159.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 6.1 KiB |
Quantile statistics
| Minimum | 10 |
|---|---|
| 5-th percentile | 11 |
| Q1 | 18 |
| median | 28 |
| Q3 | 67 |
| 95-th percentile | 94 |
| Maximum | 159 |
| Range | 149 |
| Interquartile range (IQR) | 49 |
Descriptive statistics
| Standard deviation | 29.08693909 |
|---|---|
| Coefficient of variation (CV) | 0.6950327158 |
| Kurtosis | -0.6315321958 |
| Mean | 41.84974093 |
| Median Absolute Deviation (MAD) | 15.5 |
| Skewness | 0.7387807669 |
| Sum | 32308 |
| Variance | 846.0500259 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 26 | 34 | 4.3% | |
| 10 | 31 | 3.9% | |
| 13 | 28 | 3.6% | |
| 29 | 27 | 3.4% | |
| 30 | 27 | 3.4% | |
| 11 | 24 | 3.0% | |
| 20 | 23 | 2.9% | |
| 28 | 23 | 2.9% | |
| 22 | 22 | 2.8% | |
| 15 | 22 | 2.8% | |
| Other values (66) | 511 | 64.9% |
| Value | Count | Frequency (%) | |
| 10 | 31 | 3.9% | |
| 11 | 24 | 3.0% | |
| 12 | 21 | 2.7% | |
| 13 | 28 | 3.6% | |
| 14 | 19 | 2.4% |
| Value | Count | Frequency (%) | |
| 159 | 1 | 0.1% | |
| 148 | 1 | 0.1% | |
| 123 | 1 | 0.1% | |
| 110 | 1 | 0.1% | |
| 100 | 3 | 0.4% |
| Distinct count | 32 |
|---|---|
| Unique (%) | 4.6% |
| Missing | 90 |
| Missing (%) | 11.4% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1457944.992826399 |
|---|---|
| Minimum | 798.0 |
| Maximum | 4684707.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 6.1 KiB |
Quantile statistics
| Minimum | 798 |
|---|---|
| 5-th percentile | 24720 |
| Q1 | 237854 |
| median | 636502 |
| Q3 | 3104060 |
| 95-th percentile | 4684707 |
| Maximum | 4684707 |
| Range | 4683909 |
| Interquartile range (IQR) | 2866206 |
Descriptive statistics
| Standard deviation | 1664151.074 |
|---|---|
| Coefficient of variation (CV) | 1.141436119 |
| Kurtosis | -0.5868366511 |
| Mean | 1457944.993 |
| Median Absolute Deviation (MAD) | 510424 |
| Skewness | 1.027530692 |
| Sum | 1016187660 |
| Variance | 2.769398798e+12 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 237854 | 72 | 9.1% | |
| 4408916 | 68 | 8.6% | |
| 4684707 | 59 | 7.5% | |
| 3104060 | 56 | 7.1% | |
| 923737 | 46 | 5.8% | |
| 636502 | 41 | 5.2% | |
| 977479 | 39 | 5.0% | |
| 284973 | 36 | 4.6% | |
| 421365 | 34 | 4.3% | |
| 126078 | 32 | 4.1% | |
| Other values (22) | 214 | 27.2% | |
| (Missing) | 90 | 11.4% |
| Value | Count | Frequency (%) | |
| 798 | 2 | 0.3% | |
| 2769 | 2 | 0.3% | |
| 3260 | 1 | 0.1% | |
| 6394 | 10 | 1.3% | |
| 8027 | 1 | 0.1% |
| Value | Count | Frequency (%) | |
| 4684707 | 59 | 7.5% | |
| 4408916 | 68 | 8.6% | |
| 3104060 | 56 | 7.1% | |
| 2379169 | 2 | 0.3% | |
| 1489500 | 31 | 3.9% |
Covid Cases
Real number (ℝ≥0)
| Distinct count | 642 |
|---|---|
| Unique (%) | 81.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6615.646759847522 |
|---|---|
| Minimum | 334 |
| Maximum | 218502 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 6.1 KiB |
Quantile statistics
| Minimum | 334 |
|---|---|
| 5-th percentile | 1939.1 |
| Q1 | 2270 |
| median | 2582 |
| Q3 | 8761 |
| 95-th percentile | 13926 |
| Maximum | 218502 |
| Range | 218168 |
| Interquartile range (IQR) | 6491 |
Descriptive statistics
| Standard deviation | 15108.10276 |
|---|---|
| Coefficient of variation (CV) | 2.283692481 |
| Kurtosis | 99.52218115 |
| Mean | 6615.64676 |
| Median Absolute Deviation (MAD) | 486 |
| Skewness | 9.241026721 |
| Sum | 5206514 |
| Variance | 228254769 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 2523 | 6 | 0.8% | |
| 2208 | 5 | 0.6% | |
| 2490 | 5 | 0.6% | |
| 2220 | 4 | 0.5% | |
| 2213 | 4 | 0.5% | |
| 2576 | 4 | 0.5% | |
| 2265 | 3 | 0.4% | |
| 2430 | 3 | 0.4% | |
| 2081 | 3 | 0.4% | |
| 2323 | 3 | 0.4% | |
| Other values (632) | 747 | 94.9% |
| Value | Count | Frequency (%) | |
| 334 | 1 | 0.1% | |
| 358 | 1 | 0.1% | |
| 428 | 1 | 0.1% | |
| 438 | 1 | 0.1% | |
| 449 | 1 | 0.1% |
| Value | Count | Frequency (%) | |
| 218502 | 1 | 0.1% | |
| 163115 | 1 | 0.1% | |
| 150793 | 1 | 0.1% | |
| 145606 | 2 | 0.3% | |
| 141000 | 1 | 0.1% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| City | State | Type | Population [2011] | Popuation [2001] | Sex Ratio | Median Age | Avg Temp | SWM | Toilets Avl | Water Purity | H Index | Female Population | # of hospitals | Foreign Visitors | Covid Cases | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Mumbai | Maharashtra | M.C | 12442373.0 | 11978450.0 | 878.0 | 23.0 | 32.0 | MEDIUM | 219.0 | 150.0 | 0.700440 | 10924403.0 | 159.0 | 4408916.0 | 163115 |
| 1 | Delhi | Delhi | M.C | 11007835.0 | 9879172.0 | 858.0 | 27.0 | 30.0 | MEDIUM | 215.0 | 196.0 | 0.920018 | 9444722.0 | 148.0 | 2379169.0 | 80188 |
| 2 | Bangalore | Karnataka | MPUA | 8436675.0 | 4301326.0 | 936.0 | 28.0 | 37.0 | HIGH | 212.0 | 102.0 | 0.097085 | 7896728.0 | 123.0 | 636502.0 | 141000 |
| 3 | Hyderabad | Telangana | MPUA | 6809970.0 | 3637483.0 | 930.0 | 23.0 | 31.0 | MEDIUM | 217.0 | 118.0 | 0.827744 | 6333272.0 | 110.0 | 126078.0 | 55123 |
| 4 | Ahmedabad | Gujarat | MPUA | 5570585.0 | 3520085.0 | 852.0 | 29.0 | 25.0 | LOW | 227.0 | 109.0 | 0.847941 | 4746138.0 | 73.0 | 284973.0 | 33204 |
| 5 | Chennai | Tamil Nadu | MPUA | 4681087.0 | 4343645.0 | 904.0 | 26.0 | 31.0 | HIGH | 210.0 | 179.0 | 0.536995 | 4231703.0 | 67.0 | 4684707.0 | 145606 |
| 6 | Chennai | Tamil nadu | T | 4646732.0 | NaN | 912.0 | 26.0 | 30.0 | MEDIUM | 145.0 | 177.0 | 0.093451 | 4237820.0 | 55.0 | 4684707.0 | 145606 |
| 7 | Kolkata | West Bengal | MPUA | 4486679.0 | 4572876.0 | 945.0 | 26.0 | 37.0 | NaN | NaN | NaN | 0.473585 | 4239912.0 | 82.0 | 1489500.0 | 44957 |
| 8 | Surat | Gujarat | MPUA | 4467797.0 | 2433835.0 | NaN | 27.0 | 26.0 | NaN | NaN | NaN | 0.809334 | 3797627.0 | 98.0 | 284973.0 | 23432 |
| 9 | Pune | Maharashtra | MPUA | 3124458.0 | 2538473.0 | NaN | 29.0 | 29.0 | NaN | NaN | NaN | 0.445902 | 2743274.0 | 50.0 | 4408916.0 | 218502 |
Last rows
| City | State | Type | Population [2011] | Popuation [2001] | Sex Ratio | Median Age | Avg Temp | SWM | Toilets Avl | Water Purity | H Index | Female Population | # of hospitals | Foreign Visitors | Covid Cases | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 777 | Shahbad | Haryana | M.C | 37289.0 | NaN | 829.0 | 24.0 | 33.0 | MEDIUM | 77.0 | 171.0 | 0.789984 | 30913.0 | 28.0 | 303118.0 | 1988 |
| 778 | Puranpur | Uttar Pradesh | M.B | 37233.0 | NaN | 886.0 | 28.0 | 35.0 | MEDIUM | 66.0 | 195.0 | 0.378812 | 32988.0 | 12.0 | 3104060.0 | 2478 |
| 779 | Nelamangala | Karnataka | T.M.C | 37232.0 | NaN | 931.0 | 24.0 | 34.0 | MEDIUM | 78.0 | 134.0 | 0.382265 | 34663.0 | 19.0 | 636502.0 | 2232 |
| 780 | Lalganj | Bihar | N.A.C | 37000.0 | NaN | 919.0 | 29.0 | 36.0 | LOW | 54.0 | 168.0 | 0.289709 | 34003.0 | 19.0 | 923737.0 | 2663 |
| 781 | Nakodar | Punjab | M.Cl | 36973.0 | NaN | 873.0 | 26.0 | 31.0 | LOW | 61.0 | 171.0 | 0.265890 | 32277.0 | 14.0 | 242367.0 | 2268 |
| 782 | Lunawada | Gujarat | M | 36954.0 | NaN | 846.0 | 23.0 | 28.0 | MEDIUM | 68.0 | 103.0 | 0.035280 | 31263.0 | 19.0 | 284973.0 | 1944 |
| 783 | Murshidabad | West Bengal | M | 36947.0 | NaN | 945.0 | 23.0 | 36.0 | MEDIUM | 62.0 | 136.0 | 0.056394 | 34915.0 | 22.0 | 1489500.0 | 2172 |
| 784 | Mahe | Puducherry | M | 36828.0 | NaN | 1019.0 | 28.0 | 28.0 | HIGH | 98.0 | 138.0 | 0.066752 | 37528.0 | 27.0 | 106153.0 | 2851 |
| 785 | Lanka | Assam | M.B | 36805.0 | NaN | 900.0 | 24.0 | 6.0 | MEDIUM | 63.0 | 145.0 | 0.627556 | 33125.0 | 15.0 | 24720.0 | 2158 |
| 786 | Rudauli | Uttar Pradesh | M.B | 36776.0 | NaN | 889.0 | 25.0 | 37.0 | HIGH | 51.0 | 181.0 | 0.313383 | 32694.0 | 30.0 | 3104060.0 | 2220 |